REPORT  DOCUMENTATION  PAGE 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  data  sources, 

gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection 

of  information,  including  suggestions  for  reducing  this  burden  to  Washington  Headquarters  Service,  Directorate  for  Information  Operations  and  Reports, 

1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington,  VA  22202-4302,  and  to  the  Office  of  Management  and  Budget, 

Paperwork  Reduction  Project  (0704-0188)  Washington,  DC  20503. 

PLEASE  DO  NOT  RETURN  YOUR  FORM  TO  THE  ABOVE  ADDRESS. 


2.  REPORTiDOTR  type 

final 


1.  REPORT  DATE  (DD-MM-YYYY) 

22-06-2001 


4.  TITLE  AND  SUBTITLE 

Planning-Based  Information  Agents 


13.  DATES  COVERED  (From  -  To) 

D1N0V97  -  310CT00 


6.  AUTHOR(S) 

Daniel  S.  Weld,  Professor 

Department  of  Computer  Science  &  Engineering 

University  of  Washington 


1 5a.  CONTRACT  NUMBER 


5b.  GRANT  NUMBER 

N000-98-1-0147 

1 5c.  PROGRAM  ELEMENT  NUMBER 


5d.  PROJECT  NUMBER 


|5e.  TASK  NUMBER 


|5f.  WORK  UNIT  NUMBER 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

University  of  Washington 

3935  University  Way  NE,  Box  355754 

Seattle,  WA  98195 


18.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

Dept,  of  the  Navy,  Office  of  Naval  Research 
Seattle  Regional  Office 
1107  NE  45th  Street,  #350 
Seattle,  WA  98105-4631 


10.  SPONSOR/MONITOR’S  ACRONYM(S) 

0NR 

11.  SPONSORING/MONITORING 
AGENCY  REPORT  NUMBER 


12.  DISTRIBUTION  AVAILABILITY  STATEMENT 


Approved  for  public  release,  distribution  is  unlimited. 


14.  ABSTRACT 


20010/05  054 


16.  SECURITY  CLASSIFICATION  OF: 


a.  REPORT  b.  ABSTRACT  c.  THIS  PAGE 


17.  LIMITATION  OF 
J ABSTRACT 


18.  NUMBER  19a.  NAME  OF  RESPONSIBLE  PERSON 
OF  PAGES  ~  .  , 

Carol  Zuiches _ 

19b.  TELEPONE  NUMBER  (Include  area  code) 

206-543-4043 


Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI-Std  Z39-18 


Final  Report  for  "Planning-Based  Information  Agents" 

Grant  number  N00014-98-1-0147 
November  1,  1997  -  October  31, 2000. 

Daniel  S.  Weld 

Department  of  Computer  Science  and  Engineering 
University  of  Washington 
Seattle,  WA  98195 
weld  @  cs  .Washington  .edu 


SUMMARY 

Networked  information  systems  are  making  so  much  data  available  that  people  can’t  find 
it  themselves.  Software  agent  technology  promises  to  amplify  human  decision  making 
capabilities  by  gathering  information  from  disparate  sources  in  parallel  and  integrating  it 
in  real  time.  However,  in  order  to  make  today’s  prototype  systems  realize  their  potential, 
several  bottlenecks  must  be  overcome.  First,  information-gathering  agents  need  robust 
and  efficient  execution  so  they  can  process  large  data  sets,  cope  with  network  failure  and 
site  outage.  Secondly,  in  order  to  scale  to  the  level  of  thousands  of  information  sources, 
agents  need  algorithms  for  locating  information  sources;  automatically  creating  wrappers 
for  those  sources,  processing  XML  based  representations  of  those  sources,  and  routing 
queries  to  the  appropriate  sources. 

PROGRESS 

We’ve  formulated  the  problem  of  wrapper  induction,  proved  some  theoretical  PAC 
bounds  on  the  performance  of  such  systems,  devised  a  number  of  learning  algorithms  that 
solve  the  problems  for  different  classes  of  sources,  implemented  the  algorithms,  and 
performed  empirical  tests  on  the  implementations.  Many  others  have  extended  our 
seminal  results. 

We’ve  built  a  prototype  system  that  automatically  identifies,  classifies,  wraps,  and  query 
routes  to  over  ten  thousand  specialized  information  sources.  Key  ideas  include  two  novel 
methods  for  query  routing:  intelligent  probing  of  CGI  scripts  to  determine  their  expertise 
and  using  the  Yahoo  categorization  of  specialized  information  sources  as  a  kind  of 
semantic  networks. 

We’ve  built  the  MULDER  system  which  takes  natural  language  questions,  parses  them, 
composes  a  set  of  Internet  search  engine  queries  of  differing  specificity  using  novel 
paraphrasing  technology,  sends  the  queries  to  an  engine  such  as  Google,  downloads 
likely  pages  returned  by  Google,  parses  regions  of  the  resulting  pages,  extracts  candidate 
answers  to  the  original  questions,  and  votes  to  determine  which  are  the  most  likely 
correct  answer(s).  MULDER  outperforms  commercial  systems  such  as  Google  and 
AskJeeves.  Ablations  studies  show  the  benefit  derived  by  each  of  our  techniques. 

We’ve  developed  extensions  to  the  proposed  W3C  standard  XML  query  language 
allowing  for  updates  to  XML  documents.  We  have  implemented  a  dozen  different  update 
methods  on  a  variety  of  relational  encodings  of  XML  data,  and  performed  experiments  to 
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determine  which  methods  work  best.  We’ve  implemented  a  highly  optimized  execution 
system  for  data  integration.  The  resulting  system,  Tukwila,  can  handle  four  orders  of 
magnitude  more  data  than  its  predecessor  system  Razor.  Key  ideas  include  adaptivity  at 
all  levels  of  the  architecture,  interleaved  planning  and  execution,  and  a  novel  double- 
pipelined  join  algorithm  which  greatly  reduces  latency  when  combining  data  from 
sources  connected  via  low  or  medium-speed  networks. 

We’ve  extended  Tukwila  to  natively  handle  semi  structured,  XML  information.  Our 
algorithms  leverage  existing  database  technology  yet  incorporates  novel  query  processing 
operators  (such  as  XScan).  Detailed  empirical  experiments  show  that  our  methods  vastly 
outperform  previous  methods. 

We’ve  extended  planning  technology  to  handle  interleaved  query  planning  and  execution 
as  well  as  traditional  AI  planning  in  the  context  of  uncertainty. 

We’ve  built  two  new  planning  systems.  TGP  is  a  temporal  planner  that  uses  Graphplan¬ 
like  mutual  exclusion  reasoning  to  achieve  impressive  performance.  LPSAT  compiles 
resource  planning  problems  into  a  combined  linear-programming/propositional 
satisfiability  representation,  which  is  then  solved  using  a  novel  combination  of 
incremental  simplex  and  Davis-Putnam  systematic  SAT  algorithms. 

Finally,  we’ve  implemented  the  Tiramisu  web  site  management  system.  Tiramisu 
separates  the  design  of  a  web  site  from  its  implementation,  allowing  the  use  of  multiple 
implementation  tools  while  supporting  a  high-level  declarative  model  of  the  site. 

ACCOMPLISHMENTS 

Design,  implementation  and  test  of  next-generation,  scalable,  fully  autonomous  wrapper 
creation  system. 

Implementation  of  prototype  web  resource  detector  and  query  routing  system. 

Implementation  and  testing  of  Tukwila  adaptive  execution  system  for  information 
integration. 

Design,  implementation  and  experimentation  on  MULDER,  the  first  fully  automated 
question-answering  system  for  the  WWW. 

Design  of  XML  update  language,  comparative  implementation  of  update  methods  and 
experimental  evaluation. 

Experiments  showing  utility  of  double  pipelined  join,  interleaved  planning  and  execution, 
and  other  Tukwila  features. 

Design,  implementation  and  experimentation  on  conformant  graphplan. 

Design,  implementation  and  experimentation  on  contingent  graphplan. 
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Design,  implementation  and  experimentation  on  factored  expansion  graphplan. 
Design,  implementation  and  experimentation  on  TGP  temporal  planner. 
Design,  implementation  and  experimentation  on  LPSAT  resource  planner. 
Design  and  implementation  of  Tiramisu  web-site  management  system. 


TRANSITIONS 

My  primary  collaborators  are  Professor  Oren  Etzioni  and  Professor  Alon  Halevy,  both  at 
the  University  of  Washington,  and  Dr.  David  Smith  at  NASA  Ames  Research  Center. 

Our  work  on  wrapper  induction  has  been  adopted  and  extended  by  Professor  Nick 
Kushmerick  (former  student)  now  at  Dublin  City  University,  Ireland,  by  Dr.  Steve 
Minton  and  Dr.  Craig  Knoblock  at  ISI,  and  by  the  group  of  Professor  Tom  Mitchell  at 
CMU.  Nimble  technology  (a  startup  company  which  I  co-founded  with  Professor 
Halevy)  has  licensed  the  Tukwila  data  integration  system.  NASA  is  interested  in  fielding 
our  planning  work.  The  W3C  standards  body  is  considering  incorporating  our  XML 
update  methods  in  the  next  standard. 

AWARDS 

I  was  made  AAAI  Fellow  for  my  “significant  contribution  to  the  development  of 
qualitative  reasoning  methods,  software  agent  technology,  and  plan  synthesis 
algorithms.” 

I  was  presented  with  the  WRF  /  TJ  Cable  Endowed  Professorship. 
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